1  Introduction to R and RStudio

1.1 The story of R

Before installing and using R, let’s discuss about R itself. Unlike other programming languages, such as C and Java, R was created by statisticians. More specifically, R was created by Ross Ihaka and Robert Gentleman as a free and open-source language for statistical computing and data analysis, thus making it accessible to a wider audience of researchers, statisticians and data analysts. It was a re-implementation of the S programming language with some modifications and improvements. As such, the main focus of R is statistical analysis and data visualization, making it an excellent choice for both data analysts and statisticians.

1.2 Why R

R is one of the best options for data scientists and data analysts because it is free, efficient, specialized in statistical analysis and machine learning, and can run on most platforms (Windows, Mac, etc.). Furthermore, the active and large community surrounding R offers plenty of resources for learning and support. Additionally, it is easy for developers to share add-ons, providing R users with early access to the latest tools and methods in data science from various fields. Lastly, R makes it easy for developers to share their code, often referred to as a script. A script serves as a comprehensive record of the analysis we have conducted, offering a crucial feature for reproducible work and research.

1.3 Installing R

Now that we got a first impression about what R is, let’s see how we can start using it. Firstly, we need to install R in our local computer, a task that is very easy and brief. We can do so by clicking the link below and finding the latest version in the official website. During the installation process, it is advisable to simply select the default options: https://cran.r-project.org/bin/windows/base/

After the installation is complete, we can use R immediately. When we open R, we can see its console, that looks like this:

On the console, we type commands and tap Enter to execute them. For instance, we can type “2+2” or “3 + 1” (leaving spaces between the characters does not make any difference) and then tap Enter to see that the final result in both cases is “4”:

Each time we tap Enter, the written code on the corresponding line runs. Of course, we will use R for much more complex tasks than a calculator. Although we could start working with the R console, it is much more convenient to work with RStudio.

1.4 Installing RStudio

RStudio is an integrated development environment (IDE), providing a user-friendly interface and tools to facilitate code writing, data analysis, and visualization. In simpler terms, RStudio is a tool that can help us use R in a much more convenient and flexible way. To really understand the difference between R and RStudio, we can imagine that R is like the engine of a car and RStudio is like the driver’s cockpit, with a user-friendly dashboard and controls which make it easier to operate and get the most out of the car’s engine. Without an engine though, the driver’s cockpit would not be useful at all.

We can install RStudio by clicking the link below and following the instructions (once again, it is highly recommended to stick with the default options): https://posit.co/download/rstudio-desktop/

When we open RStudio for the first time, we get the following setup (3 panes):

The left pane shows the R console, which is the same as the one we saw when we opened R. On the top right panel, we have some tabs, such as Environment and History, while on the bottom right panel we have some other tabs such as Plots and Help. We could explain what each tab shows, but it is much easier to understand this in practice, as we move along. For instance, in the Data Visualization with ggplot2 chapter, we will see that we are able to check the visualizations we created on the bottom right panel in the Plots tab.

To start a new script, we click on File -> New File -> R Script (or Ctrl+Shift+N on the keyboard). Now, R Studio should look like this:

We can type on the top left pane, which is called the Code Editor. Although we can type directly our commands on the R Console (bottom left pane), Code Editor can give us more flexibility since we can easily make changes in our code or execute the same code repeatedly. To understand the difference, let us try the same calculation as before. Typing “2+2” on the Code Editor pane, we select the line of code and click on Run. We see that the console prints our code along with the results:

1.5 R Scripts

Earlier in this chapter, we mentioned that it is possible to save and share your R code or, as it is commonly known, R script. We are able to do this by saving these scripts and editing them, even sharing them with others. For instance, suppose we want to save the current code in our laptop, locally. To do so in RStudio, we click on File -> Save As. We then choose a name and a location to save the script:

We have now saved our script (in the location of our choice). In case we want to edit or update our code, we simply click on File -> Save. It is important though to keep in mind that RStudio substitutes the previously saved file with the new one.

1.6 Global Options

In RStudio, we can change the look and the configuration settings, including the looks. To check the available options, click on Tools -> Global Options. Click on the option Appearance, for example, and select a darker editor theme (this is just a personal preference).

However, there is one change in Global Options that is highly recommended. In the option General, we should make the following changes:

  • Set Save workspace to .RData on exit to Never

  • Uncheck the Restore .RData into workspace at start

This change makes sense because, by default, R saves all objects we create. If we keep these options active, R exports this history in a file that is called .RData. Although it is not evident immediately, doing so may lead to confusion, as there is no reason to store the actual objects for our next R session - if needed, these will be loaded (or created) at the time we need them. Typically, we are only interested on preserving our scripts and we have already discussed how to save and retrieve them.

1.7 R Packages

When we install R and RStudio, it is possible to start using it immediately. There are many functions readily available, such as the ifelse() function. For instance, typing the following code, we will get respective results:

We will explain the ifelse() function later in this book, no need to explain here what exactly it does; this was just an example to get a better understanding regarding the use of R.

The standard functionality that comes with the installation of R is usually called base R. As we mentioned, R is a free and open-source programming language, which means there are many developers who have created their own contributions, or packages (and we can create our own as well!). In R, packages are collections of specialized tools and functions that extend its capabilities, allowing the researcher or analyst to perform a wide range of tasks, from data analysis and visualization to implementing specialized statistical techniques, machine learning, and more.

When we install R, the package stats comes along with the installation. However, one of the packages that we will use throughout this book is the dplyr package. To install this package and, by extension, any other package, we can simply use the function install.packages() and type in the parentheses the name of the package in single (’’) or double (““) quotes:

# Install the dplyr package 
install.packages("dplyr")

After clicking Run, we see on the console that our package is installed successfully. However, we cannot start using this package immediately. Installing means that this package exists in our computer as a file or set of files but we have not “called” it yet for use in our script. After all, we may not want to use all of our installed packages on every script. To be able to use an installed package in the current R session, we need to use the function library(), with the name of the package inside the parentheses (we can still use single or double quotes but it is not necessary in this case):

# Load the dplyr package 
library(dplyr)

Regarding this specific package, we will see in later chapters what exactly it can do and why we need it. For now, it is sufficient to briefly understand what a package is, why it is needed, how we can install one and how we can call it for use in our scripts.

We can see all the packages installed in our computer by using the following function:

# Print the installed packages 
installed.packages()

Lastly, by using the functions we mentioned, we observed another very useful feature of RStudio: when we type the first letters of a function, such as ‘insta’, RStudio automatically gives us possible options that we can choose with our mouse (or keyboard arrows):

In this way, even if we do not quite remember the full name of a function, we can benefit from this useful RStudio feature.